Effects of audio and ASR quality on cepstral and high-level speaker verification systems
نویسندگان
چکیده
Speech data for NIST speaker recognition evaluations has traditionally been distributed in compressed, telephone quality form, even for microphone data that was originally recorded at higher quality. We evaluate the effect that improved audio quality has for speaker verification performance, using a recently released full-bandwidth version of microphone data from the SRE2010 evaluation. Remarkably, we find substantially improved results even though the underlying speaker recognition models remain based on a telephone-band feature front end. For a cepstral GMM system we show improvements purely from the elimination of lossy (μlaw) coding and more effective noise reduction filtering at the full bandwidth. We also find that higher-level speaker recognition systems can benefit from better ASR quality enabled by the improved audio quality. Specifically, we show that a speech recognizer trained on full-bandwidth, distant-microphone meeting speech data yields reduced speaker verification error for speaker models based on MLLR features and word-N-gram features.
منابع مشابه
Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data
Short-term cepstral features have long been chosen as standard features for speaker recognition thanks to their relevance and effectiveness. In contrast, discriminative features, calculated by a multi-layer perceptron (MLP) from much longer stretches of time, have been gradually adopted in automatic speech recognition (ASR). It has been shown that augmenting short-term cepstral features with lo...
متن کاملA Review of Various Score Normalization Techniques for Speaker Identification System
This paper presents an overview of a state-of-the-art text-independent speaker verification system using score normalization. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Normalization of scores is then explai...
متن کاملASR Dependent Techniques for Speaker Recognition
This thesis is concerned with improving the performance of speaker recognition systems in three areas: speaker modeling, verification score computation, and feature extraction in telephone quality speech. We first seek to improve upon traditional modeling approaches for speaker recognition, which are based on Gaussian Mixture Models (GMMs) trained globally over all speech from a given speaker. ...
متن کاملEvaluation of the Vulnerability of Speaker Verification to Synthetic Speech
In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although this problem was first examined over a decade ago, dramatic improvements in both SV and speech synthesis have renewed interest in this problem. We use a HMM-based speech synthesizer, which creates synthetic speech for a targeted speaker through adaptation of a background model and a ...
متن کاملMfcc and Cmn Based Speaker Recognition in Noisy Environment
The performance of automatic speaker recognition (ASR) system degrades drastically in the presence of noise and other distortions, especially when there is a noise level mismatch between the training and testing environments. This paper explores the problem of speaker recognition in noisy conditions, assuming that speech signals are corrupted by noise. A major problem of most speaker recognitio...
متن کامل